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ABSTRACT 

We describe and compare several post-correlation radio frequency interference classifi- 
cation methods. As data sizes of observations grow with new and improved telescopes, 
the need for completely automated, robust methods for radio frequency interference 
mitigation is pressing. We investigated several classification methods and find that, for 
the data sets we used, the most accurate among them is the SvrmTriresfiold method. 
This is a new method formed from a combination of existing techniques, including a 
new way of thresholding. This iterative method estimates the astronomical signal by 
carrying out a surface fit in the time-frequency plane. With a theoretical accuracy of 
95% recognition and an approximately 0.1% false probability rate in simple simulated 
cases, the method is in practice as good as the human eye in finding RFI. In addition 
it is fast, robust, does not need a data model before it can be executed and works in 
almost all configurations with its default parameters. The method has been compared 
using simulated data with several other mitigation techniques, including one based 
upon the singular value decomposition of the time-frequency matrix, and has shown 
better results than the rest. 
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1 INTRODUCTION 

Around 1980 when the radio spectrum was becoming more 
and more occupied as a result of technical advancements 
(Pan konin fc Price] Il98l| ), radio observers started to mit- 
igate the radio freq uency interference (RFI) caused by 
electr onic equipment (|Thompson. Gergelv. fc Vanden Bout! 
Il99ll ). Until recently, on-line thresholding and manual flag- 
ging of post correlated data used to be sufficient to suppress 
RFI artefacts in the data. However, as the volume of data 
and the required sensitivity of observations increased signif- 
icantly, and the contamination of RFI through an increased 
usage of electronic equipment grew, new methods had to be 
developed to deal with the situation. 

RFI mitigation can be applied in two different stages: 
a pre-correlation stage and a post-correlation stage. The 
pre-correlation mitigation stage is very powerful as the 
observational data is still available at its highest time 
resolution. For example, there are methods that blank 
or subtract short periodic radar R FI bursts on-line 
|Niamsuwan. Johnson, fc Ellingsonll2005l ) , leaving the astro- 



nomical signal intact with only a very slightly increased sig- 
nal to noise ratio. Any residual RFI has to be removed dur- 
ing the data reduction or imaging stage, which is often per- 
formed manually, for example by finding appropriate clip- 
ping levels for contaminated baselines until the reduced data 
is free of artefacts. Pre-correlation methods have to handle 
large amounts of data in a very short time and, because of 
hardware constraints, they can often only access limited di- 
mensions of the data, such as the data from a single dish or 
station, or the data from a small time range. 

Several methods from signal processing have been 
used to perform the first pre-correlation mitigation 
stage. Ex amples of these are thresholding using \ 2 ~ 
statistics (We ber et al.l 1 19971) or a Neyman- Pearson de- 
tector (jLeshem et all 2000l ): spatial filtering with eigen- 
value decomposition of a spatial correlati on matrix 
l|Leshem et afl feoOO; Smolders fc Hamp son 2002) or by sub- 
space tr acking dEUingson fc Hampson [20021 ) : the CUSUM 
method (|Baan. Fridman. fc Millenaarl |2004|): and adaptive 
cance llation with a reference antenna (|Barnbaum fc Bradley! 
1998). In the post-correlation phase, manual flagging is often 
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the only option, but the use of an independent RFI referenc e 
signal to subt ract the RFI Ifjriggs. Bell, fc Kestevenfe oOO), 
fringe fitting [|Athrevall2009h and post-correlation spatial fil- 
tering are possible. However, none of the above are applica- 
ble or sufficient in all cases or for all types of RFI. 

In modern observatories that operate at low frequencies, 
such as the Westerbork Synthesis Radio Telescope (WSRT), 
the Giant Metrewave Radio Telescope (GMRT), the Low 
Frequency Array (LOFAR), and the Expanded Very Large 
Array (EVLA), RFI mitigation is an essential component in 
the signal processing. In the case of LOFAR, there are high 
sensitivity requ irements, especial l y for the Epoch of R eion- 
ization project (|jelic et all |200c1) ; lTnomas et all (120091 ) ). in 
what might be a busier RFI environment, with data sets up 
to a petabyte in s ize. RFI mitigation b efore correlation re- 
mains important l|Boonstra et al.ll2005h . yet the amount of 
data will be too large for manual post-correlation flagging, 
implying the need for automated fl agging strategies. 

RFI come s in many forms (Fri dman fc Baanl l200ll ; 
iLemmonl [l997) . The strong RFI that is problematic is of- 
ten either local in frequency, such as RFI caused by tele- 
vision stations, aeroplanes and radar, or is local in time, 
e.g., broadband RFI caused by phenomena such as light- 
ning, high-voltage power cables and electrical fences. Some- 
times, the frequency of RFI drifts with time as shown later 



in Figure 5(a) This can be caused by Doppler shifting of 
a satellite signal or by imperfect transmitters. A different 
class of RFI is caused by weakly transmitting, but station- 
ary and therefore systematic, devices on site. This class of 
RFI is hard to recognize, as it might cover all the channels 
in a spectral band. In fringe stopping interferometers, the 
fringe rotation causes this type of RFI t o have a sinusoida l 
response in the time-frequency domain (|Thompsonl 1 19821 ) . 
It can be recognized and subt racted in variou s ways, as for 
example described recently by I Athreval |2009l ). 

To select an RFI mitigation strategy, several consider- 
ations should be taken into account: 

• The true/false-positive ratio of the RFI classification; 

• The speed of the algorithm; 

• Detection or recovery, i.e., whether detection and flag- 
ging of contaminated areas is sufficient. In certain situations, 
it might be necessary to recover contaminated data, i.e., to 
subtract the RFI from the data; 

• The effects of RFI mitigation on the noise. For example, 
a difference in the observed noise level caused by RFI will 
be fatal for the L OFAR Epoch of Reionization experiment 
l|jelic et al.ll2008l ). 

In this paper we will evaluate the effectiveness of sev- 
eral automatic post-correlation RFI mitigation methods and 
their combinations. The methods will be compared with 
each other in order to be able to pick a general optimal 
post-correlation RFI strategy. We will do this by testing the 
methods on both artificial data and data from WSRT that 
has been observed in the frequency range of LOFAR. Testing 
the methods on WSRT data will also provide an indication 
of the effects of the RFI environment on future LOFAR ob- 
servations. 

In the next section we describe a new method of flag- 
ging RFI. We present our results, including the comparative 
study, in section [3] and discuss the results in section [4] In 



section [5] we discuss some future directions for further work 
in this area. 



2 METHODS 

Radio astronomers have developed their own ways of dealing 
with RFI during data reduction using numerous astronom- 
ical software packages. In many cases, this implies flagging 
by hand - a tedious and time consuming job. Many toolkits, 
such as MPS 1 , AIPS++ 2 , mipjad 3 and NEWSTAR 4 , provide 
specific features to perform flagging, such as the FLAGR task 
in AIPS+ + . Astronomers have automated the process fur- 
ther by designing scripts in which common signal processing 
techniques such as thresholding, s moothing, line detection 
and c urve fitting are combined dWinkel. Kerp. fc Stankol 
120061 ; iBhat. Cordes. Chatteriee. fc Laziol 120051 '). Another 
common signal processing technique known as Singular 
Value Decomposition has recently been used for the au- 
tomatic removal of broadband RFI in GMRT observations 
l|Pen et al.1 120091 ). We will describe some of the techniques 
available that relate to a new method of interference mitiga- 
tion that we will introduce, and finally we will explain the 
new method itself. 



2.1 Post-correlation thresholding 

Since RFI increases the measured absolute amplitude of a 
signal, thresholding is an effective method that is often used 
to remove strong RFI. The threshold level is often glob- 
ally determined, or sometimes set relative to the variance 
or mode distribution parameters per baseline. These can be 
stably estimat ed using, for e xample, the Winsorized vari- 
ance or mode (iFridmanl koOS). All values that are outside a 
certain range around the mean or median are flagged as bad 
data and not used in subsequent data reduction. Sometimes 
a number of samples around a bad data sample are flagged as 
well. Most astronomical reduction toolkits provide options 
to threshold part of a data cube, allowing different thresh- 
olds at the cost of an increased effort for the astronomer. An 
important consequence of thresholding is that good data is 
selected with a bias. When many non-contaminated samples 
are above the threshold, they will be flagged and not used 
in subsequent data reduction. As a result, artefacts such as 
incorrect flux densities might be caused in the image plane. 
It is therefore important to have a low false-probability rate 
of RFI detection. 



2.2 Surface fitting and smoothing 

A surface fit to the correlated visibilities V(v, t) as a function 
of frequency v and time t can produce a surface V(u, t) that 



1 AIPS: Astronomical Image Processing System, 
|http : / /aips . nr ao . edu/ | 

^ AIPS+- 1-, sequal of AlPS, |http://aips2.nrao.edu/| 

3 MIRIAD, a data reduction package tailored for the Australia 
Telescope Compact Array (ATCA), 

http : //www . atnf . csiro . au/computing/sof tware/miriad/ 

4 NEWSTAR, a data reduction package tailored for the Westerbork 
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represents the astronomical information in the signal. Re- 
quiring V(y, i) to be a smooth surface is a good assumption 
for most astronomical continuum sources, as their observed 
amplitudes tend not to change rapidly with time and fre- 
quency, whereas specific types of RFI can create sharp edges 
in the time-frequency domain. Because of the smoothing in 
both time and frequency direction, this method is not di- 
rectly usable when observing strong line sources or strong 
pulsars. The residuals between the fit and the data contain 
the system noise N no i se (v,t) and the RFI, NnFi(u, t), which 
can then be thresholded without the chance of flagging astro- 
nomical sources that have visibilities with high amplitude. 

Several suitable surface fitt ing methods exist. As an 
example, in IWinkel et alj (|2006t ) a pipeline is described in 
which a two-dimensional, low order, dimensional indepen- 
dent polynomial is iteratively fitted to time-frequency tiles 
in the data using a least square fit: 

JV„ N t 

Vk{v,i) = ^2<i k: iV l + ^2b k ,tt l + c k , (1) 

i=l i=l 

where Vk is the fitted surface that represents the astro- 
nomical information in the fc-th tile, N v ,Nt are the poly- 
nomial order for the frequency and the time, respectively, 
and ah,i,bk,i, Ck are the coefficients of the fit for tile k. 

The fit is performed iteratively, and values which have 
been flagged in previous iterations are excluded from the fit. 
This can be done by introducing a weight function Wf{v, t), 
where Wf(v, t) = indicates that the value is flagged or 
outside the boundaries of the measured time or frequency 
range, and Wf(v, t) — 1 means the value is accepted. The 
fit is performed by minimizing an error function Ek for each 
tile: 



Ek = J2J2 Wf ^ *)/(^*(". *). V{v t t)) 



(2) 



where f(a, b) = a 2 — b 2 for a least squares fit or f(a, b) — 
\a — b\ for a fit with a minimal absolute error. 

An example of this approach after a few iterations can 
be seen in Figure [1] In simple cases, the surfaces that are 
created with this approach represent the astronomical infor- 
mation reasonably well, and the method is also quite fast. 
However, as polynomial fits tend to show deviations near 
boundaries, the method is inaccurate near the boundaries 
of each tile. 

Compared with tile-based approaches, sliding window 
methods tend to be more accurate. A simple example of 
a sliding window approach is to calculate the average of a 
window of size N x M around each data value: 



with 



1 



count 



J2 J2 W P -V(v + iAv,t + jAt), 



(3) 



count 



lAT \M 

W F {v + iAv,t + jAt) (4) 

=-lNj=-lM 



This method is still fast and creates a surface without 
tile edges. However, the sliding window average represents 
the astronomical signal less well. For example, peaks in the 
original function cause square-shaped edges in the fit, which 
in the end cause classification inaccuracies. 




(a) V(u,t) 



(b) V(v, t) - V(u, t) (c) Thresholded 



Figure 1. Tile-based polynomial fitting applied to the raw visi- 
bilities from an observation of 3C196 at 140 MHz using a 144m 
WSRT baseline (see JO}. Panel [(a)] shows the tiled fit of the 
astronomical signal. Panel |(b)| shows the difference between the 
fitted astronomical signal and the observed signal used for thresh- 
olding. Panel [(c)] shows the flags on top of the original signal. The 
flags established by single pixel thresholding cover the RFI when 
verified by eye, although many false-positives can be seen which 
are caused by ("normal") noise. The tile size used for this image 
is 30 frequency channels with 10 kHz width X 50 time scans with 
10s integration time. 



One way to overcome this problem is to calculate the 
local median instead of the local average. Values that have 
been flagged in a previous iteration should be ignored by 
the median calculation. The median is insensitive to peaks 
and the surface created by the local median remains smooth 
when the window is slid over the data. The median however 
is not always a good estimate of the sliding window centre 
sample specifically, as all samples have equal weight. 

Another way to overcome the problem is to calculate a 
weighted average. Consider a weight function Wd(i,j) that 
depends on the two components i,j that represent the dis- 
tances from the centre of the window in time and frequency 
respectively. Then 

lM w d {i,i)(w F QV){v t ,t 3 ) 

V(u, t) = l —* N J -' M . ^ (5) 



weight 



where 



veight = ^2 W d (i,j)W F (v + iAv,t+jAt) (6) 



2 J 2 



This can be calculated very fast, since ^ is the convolution 
operation Wd * (Wf V) and © is another convolution 
Wd * Wf, giving: 



V = {(Wf V) * W d ) {W F * W d ) 



(7) 



where and are the elementwise multiplication and divi- 
sion operators. A good choice for Wd is the two-dimensional 
(dimensional independent) Gaussian function: 



W d (i,j) = exp 



3 

2a? 



(8) 



Together, equations ([7]) and (J5J essentially describe a 
weighted Gaussian smoothing operation, or more specifi- 
cally, a Gaussian smoothing operation with missing data. 
The parameters o v and at can be used to specify the level 
of smoothing in frequency and time respectively. Since the 
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weight function is dimensionally separable, the convolutions 
can be dimensionally separated: 

p = {W F QV)*Uv*Ut , . 

W F *U v *U t 

with U u (i) = W d (i,0) and Z7 t (i) = W d (0,j). Each of the 
convolutions in §§§ is a one dimensional convolution, and 
this is therefore a fast operation. 

2.3 The cumulative sum method 

The cumulative sum (CUSUM) method is a well known analy- 
sis method used to detect changes in distr ibution parameters 
jPagdll954l ; iBasseville fc Nikiforov|[l993l ) , such as in quality 
control in production environments. If the cumulative sum of 
sequential samples exceeds an adaptive threshold, the sys- 
tem enters an alarmed state and changes can be made to 
correct the quality. In its common form, the likelihood for 
two distribution parameters is used to compute the thresh- 
old. 

To turn this method into an RFI mitigation strategy, 
the total observed power or power received at a certain fre- 
quency by a single dish can be used as the sequential input 
values to the CUSUM method. The likelihoods of either vari- 
ance or mean of RFI can be estimated using the variance 
of the signal (|FriedmarJll996l ; iBaan et alj|2004 ). Observing 
can be stopped as soon as RFI is detected, and can con- 
tinue when reception has returned to normal. This method 
can be easily implemented for on-line RFI detection, as it 
is simple and fast. However, the CUSUM method does not 
estimate the start time of the change, it only detects the 
change quickly, which nevertheless may cost time and thus 
some bad data may leak through before the method detects 
faint RFI. Hence, the method is more applicable to a first 
check on the data than to actually perform flagging. The 
subsequent sections will describe a method that combines 
the detection strength of the CUSUM method with the possi- 
bility of performing flagging off-line. 



of an RFI sequence in the frequency direction, the following 
rule is applied: 

flagi/M (v, t) = 3i e {0 . . . M - 1} : Vj € {0 . . . M - 1} : 

\R(v+(i-j)Au,t)\> X M (10) 

where M is the number of samples in a combination. The 
flagging rules for the time direction are correspondingly de- 
termined. Finally, a sample is flagged if any of the two 
rules is satisfied. We will call this method the Var Threshold 
method. 

We will show a simple example to demonstrate the 
method. Consider the following values: 

/l 2 1 4\ 
R= 4 1 1 4 (11) 
\2 2 1 4/ 

Each row represents a frequency channel and each column 
represents a time scan. Assume the high values in the fourth 
column were caused by broadband RFI. When using a nor- 
mal threshold x = 3, all samples with value 4 would be 
thresholded, including one false-positive. However, if we 
used combinatorial thresholding, with xi = 5 and ^2 = 3, 
we would threshold only the three broadband RFI samples. 

The above text suggests an implementation of this 
method by a procedure which iterates over all samples and, 
for each sample, checks if it and its M € Ai neighbours form 
an RFI sequence in one of the directions. Alternatively, an 
implementation can start by marking all samples above a 
certain XM as candidates. Subsequently, only the marked 
candidates that form a connected segment with more than 
M connected samples in an orthogonal line in one of the 
directions are flagged. This procedure is repeated for all 
M 6 M. From this perspective, it is easy to add other mor- 
phological constraints. Instead of looking for straight lines in 
the time and frequency direction, an enhanced version might 
flag connected shapes covering a specific area, or shapes that 
form a line or curve in the plane, possibly not connected, 
that are likely to be caused by RFI. 



2.4 Combinatorial thresholding 

RFI bursts often affect multiple samples which are connected 
either in frequency or time. We will now introduce a new 
threshold mechanism that makes use of this knowledge: we 
will flag a combination of samples when a property of this 
combination exceeds some limit. Assume that A and B are 
neighbouring samples. In normal thresholding, we will look 
at each of the samples A and B individually and flag one 
of them if it exceeds some "single sample" threshold xi- 
For combinatorial thresholding, a new flagging criterion is 
added: if A and B do not exceed the single sample threshold 
Xi individually, they can still be flagged when A and B both 
exceed a somewhat lower threshold X2- If not, they can be 
combined with a third neighbour, C, and thresholded at X3i 
etc. The more connected samples are combined, the lower 
the sample threshold. 

Given a set of strictly decreasing thresholds, {x»}j = i> a 
value will be classified as RFI if it belongs to a combination 
of i values in either the time or frequency direction in which 
all absolute values are above the threshold Xi- To determine 
whether a single sample R(v,t) should be flagged because 



2.4-1 VarThreshold parameters 

The following list of parameters need to be optimised to 
make efficient use of this approach: 

• The false-positive rate on uncontaminated samples. The 
lower the value, the more RFI remains. The higher the value, 
the more uncontaminated samples will be flagged. We will 
discuss this in i]2.4.2l 

• A set that defines which samples are combined. For 
this we define Ai, a set containing the number of sam- 
ples that will be combined in each of the four directions. 
Ideally, each sample will be combined with all samples of 
either the same frequency or the same time, i.e., Ai = 
{i £ Z : 1 < i < max(JV„, At)}, with Z the set of integers. 
Empirically, a small subset Ai = {1, 2, 4, 8, 16, 32, 64} works 
almost as well and saves summing and comparing many sam- 
ples. 

• The set of thresholds {\m ■ M 6 Ai} for the different 
number of combinations M. The total set of thresholds is 
expressed by two parameters, xi (the threshold on a single 
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sample) and p, using the following formula: 

A value of p = 1.5 empirically seems to work well for the 
Var Threshold and the below denned SumThreshold method. 
To find Xi f° r a desired false probability rate, p is kept con- 
stant and the xi value is binary searched by performing mit- 
igation on data selected from the distribution of the noise, 
with the values {Xi\i<zM computed as in (|12[) . until the false 
probability rate is close to the desired rate. 

Since the method is combined with a surface fitting strategy, 
the following parameters are added: 

• The number of iterations to be performed. The resulting 
accuracies are good with about 5 iterations. 

• The iteration sensitivity as a function of the iteration 
number, rj(i). In each iteration, the threshold sensitivity is 
increased (more samples are flagged). To accomplish this, all 
the thresholds {xi}i€M are decreased by dividing them by a 
factor of r/(i). Only during the last iteration will a sensitivity 
of 100% be used. By slowly increasing the sensitivity a first 
bad fit to the background won't have much effect, since only 
the very strongly RFI contaminated samples are removed. 
Using an exponential function for rj(i) was found to work 
well. 

2.4-3 The VarThreshold false-positive ratio 

Assume that R ~ T>(aN s ), where R is the residual of the 
complex correlated visibilities V and the surface fit V, and 
T> is a distribution with parameter a. The probability that 
a non-RFI contaminated sample from the residual is larger 
than x can be determined with: 

— X oo 

VuVt : P (\R(u,t)\ > x) = J <Pa(x)dx + J ip a {x)dx, (13) 

-oo X 

where <p(x) is the probability density function of the distri- 
bution T>(aj^j). Note that the term ip a (x)dx is only rel- 
evant when the distribution contains negative values - unlike 
the Rayleigh distribution - and the values are thresholded 
above x as weu as below — x- 

The combined threshold false-positive rates can best 
be calculated numerically, since an analytical calculation is 
rather complex, even for M with a single combined thresh- 
old xm- This analytical calculation will be demonstrated 
for M — 2. First it is assumed that any two samples, 
R(vi,ti) and R(v2,t2), are independent when they are not 
RFI contaminated. This is the case if the fit represents the 
astronomical data and system noise is uncorrelated. With 
this assumption, the probability Pf a i se for a single non- 
contaminated sample Ri with M = 2 to be flagged in one 
of the four combinations with its neighbours R2...5 can be 
calculated with: 

Pfaise (M) 

P(ftagi/ M =2(M) V flagiAf =2 (",*)) 
P(|Jii|>XA3ie[2...5]:|B i |>x) 
= P(\R\ > X )-P(\R\ >x)(l-P(|i?| >x)) 4 - (14) 

The corresponding formulae for larger M are more complex. 
When M contains more than one element, the false-positive 
ratios for the elements Mi can not be simply added to obtain 
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Figure 2. The false-positives of the VarThreshold method when 
flagging with a single combination M = {M} without surface 
fitting. Samples were selected from a Rayleigh distribution, which 
is the distribution of the visibility amplitudes. \ is relative to the 
mode of the distribution. 

the combined false-positive ratio, as P(flag!/M; {v, t)) and 
P(flag^Mj iy, t)) are not statistically independent: both will 
at least make use of sample R(v,t). Given this, the analyt- 
ical expression becomes rather complex and the probability 
is evaluated numerically. 

Figure [2] shows the result of calculating the total false- 
positive ratio numerically, for several values of M. 



2.5 The SumThreshold method 

Now we will present a variation on the VarThreshold 
method that improves the classification performance. This 
method, named the SumThreshold method, is a flagging 
method that combines samples as in the VarThreshold 
method. In this alternative case, the sum of a combination 
of one or more other samples is used as a threshold crite- 
rion. As in the VarThreshold method, the threshold XM is 
variable and depends on M, the number of samples that are 
summed. 

Unlike the VarThreshold method, this approach al- 
lows the flagging of a sequence of samples when it contains 
samples with values below the sequence threshold value. 
However, without an additional rule, there are situations 
in which this method might flag too many samples. For 
example, consider the sequence [0, 0, 5, 6, 0, 0] that repre- 
sents a strong RFI contamination in two samples. When 
the SumThreshold method without a second rule is applied 
with average threshold values xi = 7, X2 = 5, X3 = 4, 
. . . , X6 = 1.8, all six values would be thresholded, as their 
average exceeds 6x6- The following rule is therefore added: 
the values are thresholded in the increasing order xi, X2, 
. . . , Xm- When a lower threshold has already classified sam- 
ples as RFI contaminated, the samples will be left out of 
the sum and replaced by the average threshold level. In the 
example case, the values 5 and 6 will be classified as RFI 
by the second threshold, and therefore will be replaced by 
X6 when combining all the six samples. The average of the 
sequence for the sixth threshold is therefore calculated as 
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Figure 3. The probability of a false-positive when thresholding 
with a single combination M = {M} using the SumThreshold 
method without surface fitting. The Rayleigh distribution was 
used for the simulation, x ls th e average threshold relative to the 
distribution mode. Thus a combination of samples was thresh- 
olded when their sum exceeds \ X M X a. The false ratio for 
M > 2 is different from the Var Threshold method (Figure [2]). 
Because of this difference, the parameter p used to calculate the 
set of thresholds as in \12) needs to be optimised for the meth- 
ods individually. Although the false ratio is not decreased when 
comparing this method with the Var Threshold method, the true 
ratio is often increased (Figure [SJ> . 



(0 + + xe + X6 + + 0) /6 = !%6- As a consequence, only 
the samples with values 5 and 6 are flagged. 



2.5.1 The SumThreshold false-positive ratio 

We calculate the theoretical false-positive ratio for M — 2 
as for the Var Threshold method. The probability P(T X , 1.2) 
that the sum of two independent random samples exceeds a 
certain value \ is given by: 

Vviv 2 tit 2 : P(T x ,i, 2 ) = P[R(u 1 ,t 1 ) + R(u 2 ,t 2 )>x] 

= P(V(2a Ns )> X ) 
oo 

= J <p2a(x)dx (15) 

X 

When thresholding the average of a combination of two 
samples, each sample will occur four times in a hypothesis 
test, once with each of its neighbours. On uncontaminated 
samples, the probability of a false-positive for each of these 
tests is given by (|15[) . The probability for a false-positive 
with the four tests applied on each sample becomes: 

f(T x ,ix4) = P(T xA ,2 VT x ,i,3 VT XlM VT x ,i, B ) 

Because the tests {T Xt i > i} i=2 are dependent on each 
other, it is much easier to calculate the false-positive 
rates numerically. This can be performed by applying the 
SumThreshold on a large amount of data selected from the 
distribution T>. The result of such a simulation is in Figure[3] 



Figure 4. The distribution of the singular values of two artifi- 
cial measurements: one containing Gaussian noise only, the other 
containing Gaussian noise poluted by broadband RFI. In this ex- 
ample, the first five singular values are affected by the broadband 
RFI. In general, the number of singular values that are affected 
by RFI and the possibility to recognize them varies depending on 
the orthogonality properties of the RFI. 

2.6 Singular Value Decomposition 

Singular value decomposition (SVD) is a mathematical tool 
for finding the singular values of a matrix, which can exhibit 
certain properties of the matrix. 

A singular value decomposition consists of finding the 
complex unitary M x M and N x N dimensional matrices 
U and V containing respectively a left and right singular 
vector in each row, and the diagonal, M x N dimensional 
real matrix E containing the singular values, such that: 

A = UT,V T (16) 

RFI is mitigated from the data set by performing this de- 
composition on a matrix A. Each element Aij represents the 
measured flux, where i is a baseline-frequency index and j 
a time index. Each given matrix A has a unique solution 
for the singular values E, if the singular values are sorted, 
but there is no unique solution for U and V (for example, 
A remains equal when all values in U and V are negated). 
It is assumed that the highest singular values represent the 
singular values of the RFI data. To mitigate the RFI, the 
highest singular values in E are set to zero and the new 
matrix A is recomposed from U, E and V. 

The number of singular values to be removed or set to 
zero has to be chosen in such a way that only the RFI is re- 
moved. The singular values that correspond to RFI are often 
strong outliers, whereas the singular values of Gaussian noise 
form a smooth curve. The position of the abrupt change in 
the curve of the singular values is used as the number of 
singular values to be removed, as is shown in Figure 2] 

2. 6. 1 Properties 

Let L = min(M, N), then: 

L 

Aij =^^UikT,kkVjk- (17) 
fc=i 
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U and V are unitary, UU = I with U the Hermitian trans- 
pose, and the rows and columns of the matrices form by 
definition a complex orthonormal basis. This implies: 

L 

Vi 6 [1..M] : ^2 Ufj = 1. (18) 

3=1 

Hence there is at least one non-zero value in each row and 
column of the matrices U and V, and setting a non-zero 
singular value to zero changes A. If A contains real values 
only, equation (JTSJ implies that all values in U and V are 
between —1 and 1, and removing a singular value Ej, can 
alter each value in A by at most £;;. In the complex case, 
removing a singular value can alter the absolute value of 
each value in A at most by £«. In general, setting E,j to 
zero subtracts a matrix Ti with rank 1 from A, as (1^)^ = 
UjiSuVki, and thus all columns are linearly dependent. 

The orthogonality properties imply that the order of the 
rows and columns in the original matrix A do not change 
the singular values: the order of the rows and columns is ir- 
relevant for the SVD method to detect RFI. Intuitively, the 
SVD method does not "distinguish" between a smoothly 
increasing amplitude, caused by astronomical sources, and 
RFI, and might fail to correctly subtract or detect RFI be- 
cause of the astronomical signal. 

If RFI is to be separated from the signal, the RFI and 
the signal have to adhere to the following properties: 

• All columns containing RFI (and consequently all rows) 
have to be orthogonal to the astronomical signal. In other 
words, for any column or row a in the matrix, ap^pj ■ 
a signal — 0> w ith a RFl the RFI component and a a ig na ] the 
signal component in the data. 

• The singular values of the RFI are substantially higher 
than the singular values of the astronomical signal. This 
requires the RFI to be strong. 

• The individual RFI columns are either fully linearly de- 
pendent on or fully orthogonal to each other. If the indi- 
vidual RFI components are partially dependent, the largest 
part of the RFI is removed and the singular value of what is 
left of the RFI might not have enough 'gain' to be removed 
or flagged. 

Iteratively fitting a surface and subtracting the surface, 
as in £12.21 might improve the compliance to the first re- 
quirement, although it increases the execution time of the 
method. Another way to improve compliance to the require- 
ment is to remove the astronomical signal by subtracting a 
good model beforehand. 

It is useful to note that unitary transformations do not 
change the singular values of a matrix, although they might 
change the singular vectors. Since the Fourier transform is a 
unitary transformation according to Parseval's theorem, the 
following equation holds: 

A = USV & T{A) = U'SV' (19) 

The consequence of this is that it does not matter whether 
the SVD method is executed in the time-frequency domain, 
the time-lag domain, or another Fourier domain, since set- 
ting singular values to zero in the Fourier domain would set 
the singular value to zero in the original domain. 



2.7 Input data types 

The combined thresholding methods described in this paper 
can be applied to several types of data: auto-correlated or 
cross-correlated, to specific polarizations or Stokes parame- 
ters, to amplitude or to phase, etc. 

We have compared flagging on cross-correlations and 
auto-correlations. The cross-correlations of each baseline can 
be processed with one of the flagging methods, resulting in 
N(N— 1) /2 correlations to be processed. Alternatively, every 
antenna can be individually flagged by processing the auto- 
correlations, and samples in a baseline might be flagged if ei- 
ther of the corresponding samples in the individual antenna 
auto-correlations have been flagged. Only N correlations 
need to be searched for RFI in this case. In addition to the 
benefit of speed, RFI is strongest in auto-correlations and 
the data contain no fringes from astronomical sources, as 
auto correlations do not have interference patterns, thus of- 
fering an improved accuracy in RFI detection. On the down 
side, some RFI might be present in auto-correlations that 
would have been mitigated by cross-correlation, and detect- 
ing RFI in auto-correlations potentially throws away some 
usable data in the cross-correlations. 

In cases where the polarization of the observed electro- 
magnetic waves is measured, the polarization might contain 
valuable information for RFI classification. For now, we have 
processed each polarization individually, without exploiting 
relationships between polarizations. 



3 RESULTS 

3.1 Surface fitting results 

In i]2.2l we described several surface fitting methods to esti- 
mate the astronomical signal in the frequency-time domain. 
We found that the surface fitting methods when combined 
with one of the classification methods do not differ much in 
accuracy. A sliding window approach was found to be more 
stable compared with a tile based approach. The Gaussian 
weighted average, a polynomial fit and the window median 
for the subtracted surface were found to be approximately 
equal in their accuracies after optimising their parameters 
such as the window size, the Gaussian kernel size and the 
order of the polynomial, although their parameters do influ- 
ence the accuracy. 

Finding global parameters that always work well (or 
automatic procedures to find the parameters) is not trivial. 
The algorithm can handle data with very different charac- 
teristics: it can be applied to XX, XY, YX or YY polariza- 
tions, auto- or cross-correlations from either long or short 
baselines, for LOFAR or for WSRT data, before or after cal- 
ibration, etc. To use the same surface fitting parameters in 
all these different situations, the window size, and if appli- 
cable the Gaussian kernel size, needs to be rather small. The 
expected amplitude changes of celestial signals are usually 
much less in the frequency direction, and setting the win- 
dow size larger in the frequency direction improves stability. 
We used a typical size of the sliding window of 40 frequency 
channels x 20 time scans and Gaussian kernel parameters 
of a v — 15 and at = 7.5. The numbers are based on trials 
using different observed and artificial data sets. The param- 
eters are relative to the number of channels and number of 
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100 200 300 400 500 600 700 100 200 300 400 500 600 700 

Time(x10s) Time(x10s) 

(a) Original observation (b) After removing the highest singular values from the image 

(note the different flux scale). 



Figure 5. The auto-correlated data in this image demonstrate the inability of the SVD method to remove sources that slowly change 
frequency over time (e.g., because the source has a changing velocity in the direction of the antenna). This type of RFI seems to be 
relatively common in low frequency WSRT data. The RFI in this particular example is so strong that it can be easily removed by 
thresholding, but this plot is to illustrate the effects of such RFI. When the frequency-changing signal is faint and cannot be removed 
by thresholding, applying SVD will, as in this example, change the astronomical information in the data in an unpredictable way. 




(a) Recomposed image from the low singular values. 




(b) Recomposed image from the high singular values. 



Figure 6. SV decomposition of test set A (Figurc [7(a)[ l: noise with broad band RFI covering all channels homogeneously. The recomposed 
image from the low singular values (top panel) looks very promising: none of the RFI is left and the noise seems to be untouched. However, 
a recomposition of the matrix with only high singular values (bottom panel), i.e., the part that has been subtracted from the image, 
shows that the noise is affected in an unpredictable way by the decomposition. This is the best case for the SVD method; in more realistic 
scenario's, the data should include a residual astronomical signal and broadband RFI that might not be linearly dependent. 



time steps. For WSRT data, a channel is 10 kHz wide and 
a time scan is 10 seconds long. LOFAR will have a 1 kHz 
x 1 second correlation output resolution. For best results, 
the length and width of the window should be about three 
times the Gaussian kernel size or larger. 



3.2 RFI classification results 

Both the SVD and threshold methods show accurate re- 
sults on removing line RFI and broadband RFI. The SVD 
method is not suitable for removing frequency-varying RFI, 
as demonstrated in Figure [5j and thus has to be comple- 
mented with other techniques to remove all RFI. However, 
the SVD method can be used to subtract and remove the 
RFI from the image, leaving the astronomical signal intact. 
For this to be succesful, considerable assumptions about the 



mathematical properties of RFI and the astronomical signal 
have to be true: the time-frequency matrix with the RFI 
components has to be orthogonal to the time-frequency ma- 
trix of the astronomical signal, and the different RFI compo- 
nents have to be either orthogonal to each other or linearly 
dependent on each other. Figure |S] shows the SVD decom- 
position of test set A that consists of uncorrelated noise and 
linear RFI. 

As it is hard to quantitatively compare RFI mitigation 
methods based on data sets of which the characteristics of 
the RFI cannot be known for certain, several artificial test 
sets were created. These sets are shown in Figure [7] and con- 
tain broadband RFI only. Since the RFI was added artifi- 
cially, the location of the RFI in the time-frequency domain 
is known, and the accuracy of the methods can be tested 
quantitatively. The results are drawn as receiver operating 
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1 I ' 

(a) Test set A: noise with broadband RFI contaminating (b) Test set B: broadband RFI contaminating a part of 
all channels, ordered from strong (left) to weak (right), the channels 




(c) Test set C: broadband RFI contaminating different (d) Test set D: a simulated observation of the cross- 
channels correlation of three point sources being close together 

added to test set C 




(e) Test set E: a simulated observation of the cross- (f) Test set F: as E, but RFI as in test set C 

correlation of five distant sources added to test set A 




(g) Test set G: as F, but Gaussian smoothed before (h) Test set H: a high frequency background signal 
adding RFI added to test set C 



Figure 7. The artificial test sets containing broad band RFI, used for testing and parameter optimisation. In all images, time is along 
the horizontal and frequency along the vertical axis. All test sets simulate a similar baseline. 



characteristic (ROC) curves in Figure [8] ROC curves show 
the true probability rate against the false probability rate. 
The different accuracies and characteristics of the methods 
can easily be compared in ROC graphs. 

The SumThreshold method shows a considerably better 
accuracy in all the test sets. Test sets A and B contain RFI 
that is completely linear dependent, and the SVD method 
also works very well in these sets. The SVD method could 
actually be used to subtract the RFI instead of flagging and 
not using the data. However, to mitigate the RFI in test 



set C, the methods have to deal with RFI that is neither 
orthogonal nor completely dependent on each other, and 
thus the accuracy of the SVD method decreases. 

A normal thresholding strategy was also tested to com- 
pare the results. When performing normal thresholding with 
a surface fit as in the SumThreshold method, the accuracy 
for thresholding actually decreases in the test cases with- 
out an astronomical signal (see the curves labelled "Fit + 
simple threshold" in Figure [8}. This is partially because the 
surface fit was optimised for the SumThreshold method. Fur- 



10 A.R. Offringa et al. 



ROC curve -- Test set A 




0.2 0.4 0.6 0.8 1 
False positives (%) 



ROC curve -- Test set B 




J VarThreshold 

1 >— Median filter 

0.2 Simple threshold 

Fit + simple threshold 
SumThreshold (sliding median) 



ROC curve -- Test set C ROC curve -- Test set D 




False positives (%) False positives (%) 



ROC curve - Test set E 



ROC curve - Test set F 



100 




100 



0.2 0.4 0.6 0.8 
False positives (%) 



o 

CL 




0.2 0.4 0.6 0.8 
False positives (%) 



ROC curve -- Test set G 



ROC curve -- Test set H 



100 



80 



20 
































I 





100 



0.2 0.4 0.6 0.8 
False positives (%) 




0.2 0.4 0.6 0.8 
False positives (%) 



Figure 8. The ROC curves produced by applying various RFI detection methods to the test sets. The closer an ROC curve passes the 
top-left of the graph at 100% true-positives with 0% false-positives, the more accurate is the method. 



thermore, since the accuracy of the thresholding is not very 
good, the fit is influenced by the undetected RFI, causing 
more errors. 

When astronomical information is added as in test set 
E and a more complex background is added as in test set F, 
the SVD method shows a decreased accuracy in mitigating 
the RFI, as can also be seen in Figure(9] However, in test set 
G, the background of test set F is Gaussian smoothed and 
subtracted, as is done before thresholding. The SVD method 
now shows an improved accuracy, though still not as good 
as the SumThreshold method. Test set H shows that the 
linear dependency of the RFI is not the only requirement 
for succesful mitigation with the SVD method: the added 
RFI is completely linearly dependent in this test set, but 
the background is still causing low accuracies in the SVD 
method. 

It should be noted that some of these test sets are mea- 
suring the theoretical accuracy of non-orthogonal, but not 
completely independent RFI contamination. As shown in 
i|2.6l this was the hardest case for the SVD method. When 
in practice the RFI does behave in an orthogonal or depen- 
dent manner, the results might be quite different. Never- 
theless, it is unlikely that all RFI contaminations that are 
measured by different antennae at different times are always 
either linearly dependent or orthogonal. 

The presented test sets simulate a single baseline, 
whereas in a real measurement, the SVD method will exploit 
the correlation of RFI between different antennae. This will, 
however, also decrease the probability that all RFI is either 
orthogonal or linearly dependent. 

3.3 Automatic flagging of WSRT data 

To test the various RFI flagging algorithms we have used 
WSRT data in the LFFE band from 138-157 MHz obtained 
in November and December 2 007. The observations have 
been described and analysed bv lBernardi et al] (|2009al lbh to 
which we refer for details of the astrophysical motivation 
and calibration. For our analysis, however, we used the raw 
uncalibrated visibilities. The correlator integration time for 
the data was 10s. A total of 8 bands of 2.5 MHz width were 
available. The central frequencies of these bands were lo- 
cated at 139.3, 141.5, 143.7, 145.9, 148.1, 150.3, 152.5 and 
154.7 MHz. Each band was divided into 512 spectral chan- 
nels. The data were Hanning tapered, yielding an effective 
spectral resolution of 9.8 kHz. Therefore, adjacent spectral 
channels are highly correlated. A total of 13 telescopes par- 
ticipated in the observations providing a total of 78 inter- 
ferometers with baselines from 36 to 2736 meters. All four 
cross-correlations between the orthogonal, linearly polarized 
feeds were used in the analysis. 

We have tested the various methods on several data 
sets. The SumThreshold method in combination with Gaus- 
sian smoothing shows especially excellent results. Figure [TOl 
shows a typical time-frequency diagram of WSRT data at 
~140 MHz and the application of the SumThreshold method. 
Although the smoothed surface is slightly affected by the 
RFI after five iterations, as faint artefacts are visible in the 
smoothed surface around places where RFI used to be, the 
effect is so small that it does not pose a problem for the 
SumThresholding method. However, it makes the calculated 
false probability rate inaccurate, as the false probability cal- 
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(a) Original 




(b) Automated flagging result 



Figure 11. Time (horizontal) vs. frequency (vertical) plots of 
WSRT data, cross-correlations of antenna 1 vs. 2: a particular 
bad band at 121.3 MHz - 123.7 MHz of an observation of 3C147, 
showing that the method remains robust in one of the worst cases 
at the WSRT. 



culations assume independence between the residual sam- 
ples. When validating the results by visual inspection, we 
see far less false detections than the calculated false proba- 
bility rate. 

We were able to use the same parameters for any sit- 
uation in the WSRT data, and therefore were able to com- 
pletely automate the flagging process. Even at baselines and 
frequencies with dramatic RFI contamination of up to 50%, 
the SumThreshold flagging method remained stable and ac- 
curate. Figure [TT1 shows, for example, a badly contaminated 
band of WSRT data that is almost perfectly RFI flagged. 



4 CONCLUSION AND DISCUSSION 

In this article we have shown several approaches to deal 
with RFI that is left after correlation. The results show that 
automated flagging with the SumThreshold method works 
well for broadband and peak RFI. In all cases, the default 
parameters for the method work well, although parameter 
tweaking might in some cases improve the classification. In 
the artificial broadband RFI situations, it detects 80% of 
the artificially inserted RFI with less than 0.1% error, and 
often approaches a 99% recognition almost without error. 
The accuracy of this method is therefore as good as can 
be expected from manual flagging. In the case of WSRT, 
the new method does not improve the dynamic range of the 




(b) Automated flagging result 




(c) Smoothed 



(d) Difference 



Figure 10. Time (horizontal) vs. frequency (vertical) plots of uncalibrated WSRT data, cross-correlations of antenna C vs. D. and the 
application of the SumThreshold automated flagging procedure. Panel [(a)] shows one hour of the amplitude of a 3C196 observation, panel 



shows the result of the flaj 
and panel [(c)] 



ger, panel [(c)] shows the fitted surface after 5 iterations, and panel [(d)] shows the difference between panel 



data compared with manual flagging, but the method saves 
a considerable amount of work. 

New telescopes such as LOFAR and SKA require ro- 
bust automatic procedures, as these telescopes will produce 
data sets that exceed current measurements in volume by 
orders of magnitude. The ability to flag or check baselines 
or subbands individually will be lost. 

The ROC analysis shows that the SumThreshold 



method is to be preferred above the Var Threshold and SVD 
methods. The SVD method can be used in some respects to 
detect RFI, but is less accurate. It can either be used to 
detect the RFI or to correct samples. If it is used to correct 
samples by filtering the RFI out, rather than only detecting 
and flagging it, artefacts with unknown characteristics could 
remain in the data. For WSRT data, these artefacts look as 
bad as the broadband RFI itself. 
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All methods have been tested without assuming a data 
model. Subtracting the model before RFI detection might 
improve the classification further. Nevertheless, the detec- 
tion accuracy with and without a model do not differ much. 
As such, going back and forth between flagging data and 
creating a model is not necessary in most cases. 



5 FURTHER WORK 

RFI with a moderate strength that can be detected by eye 
was found to be of no concern for automatic flagging meth- 
ods in sensitive telescopes such as WSRT. However, a dif- 
ferent kind of RFI might still pose problems. Certain weak 
RFI, such as radiation that leaks from cabins in situ, might 
be present in many channels for a substantial duration of 
the observation. This might pose problems for observations 
that require long integration times to achieve their required 
signal-to-noise ratios, such as the LOFAR-EoR project. If 
the RFI is persistent in time, systematic errors could result. 
There are some interesting ways to remove these, and one of 
th em is the fringe- fitting RFI mitigation method described 
by lAthreval (|2009t ). Although this technique works at the 
GMRT, preliminary tests with the fringe-fitting RFI miti- 
gation method on WSRT and LOFAR data do not show a 
strong presence of this type of RFI, and removing very weak 
RFI with a similar method requires more work. Therefore, 
to determine whether this type of RFI is really present, and 
whether it might be removable is yet to be seen. 

An important next step is to consider practical issues in 
RFI mitigation techniques. For example, the effects of many 
RFI mitigation methods, post as well as pre-correlation, 
need to be simulated, since we never know what the image 
plane ought to look like. Also, which post and pre-correlation 
methods can be combined? Under which practical circum- 
stances do RFI mitigation methods fail? How can we be sure 
that astronomical detections are not caused by RFI, or by 
the methods that try to reduce RFI? Answering these ques- 
tions is important for establishing the reliability of new RFI 
mitigation methods and for their regular use by astronomers. 

Although, at this point, it seems to be of little concern 
to improve the SumThreshold automatic flagging method 
any further, it might be interesting to improve it by com- 
bining more information for detection and by using fuzzy 
logic to decide the sample classification. An interesting ex- 
ample would be to include phase information in the recog- 
nition, as only the amplitude information has been used so 
far by the threshold methods. For example, Figure IT21 shows 
that the phase contains valuable information about a sam- 
ple: in uncontaminated samples, the phase is likely to be 
near zero rotation, whereas many contaminated samples do 
have a phase deviating from zero. Other distinguishing infor- 
mation could be contained in the polarization information 
per sample and in the combination of different baselines. 

Based on the low frequency observations with the 
WSRT, it can be expected that the radio environment of 
LOFAR is sufficiently clean for sensitive astronomical exper- 
iments. In a future paper we will fully analyse and describe 
the LOFAR environment and the effectiveness of the RFI 
strategies. 

Finally, we would like to emphasise that the method- 
ology of RFI flagging, or any kind of error detection, needs 
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Figure 12. Typical histogram of the phase in a short baseline 
of a WSRT observation. The RFI was detected by using the 
SumThreshold method. The plot implies that RFI-contaminatcd 
samples have a much higher probability to have a phase deviating 
from zero, and the phase thus contains valuable information for 
RFI detection. 



to change because of the introduction of telescopes such as 
LOFAR, that generate so much data that it is not possi- 
ble for astronomers to browse through the data for "the 
baseline that was producing this artefact" or "the timestep 
that corresponds to these stripes in my image". Therefore, 
another important next step is to be able to automatically 
detect errors that are caused by RFI, calibration issues, bro- 
ken hardware, faulty software or any step in the complicated 
pipeline of a radio observatory. 



SOFTWARE 

Software to flag measurement sets with the SumThreshold 
method and other discussed methods has been made pub- 
licly available and can be downloaded from the following 
location: 

http : //www. astro .rug.nl/rfi-sof tware/ 
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